Cluster Analysis in Fitting Mixtures of Curves

نویسنده

  • Tom Burr
چکیده

One data mining activity is cluster analysis, of which there are several types. One type deserving special attention is clustering that arises due to a mixture of curves. A mixture distribution is a combination of two or more distributions. For example, a bimodal distribution could be a mix with 30% of the values generated from one unimodal distribution and 70% of the values generated from a second unimodal distribution. The special type of mixture we consider here is a mixture of curves in a two-dimensional scatter plot. Imagine a collection of hundreds or thousands of scatter plots, each containing a few hundred points, including background noise, but also containing from zero to four or five bands of points, each having a curved shape. In a recent application (Burr et al., 2001), each curved band of points was a potential thunderstorm event (see Figure 1), as observed from a distant satellite, and the goal was to cluster the points into groups associated with thunderstorm events. Each curve has its own shape, length, and location, with varying degrees of curve overlap, point density, and noise magnitude. The scatter plots of points from curves having small noise resemble a smooth curve with very little vertical variation from the curve, but there can be a wide range in noise magnitude so that some events have large vertical variation from the center of the band. In this context, each curve is a cluster and the challenge is to use only the observations to estimate how many curves comprise the mixture, plus their shapes and locations. To achieve that goal, the human eye could train a classifier by providing cluster labels to all points in example scatter plots. Each point either would belong to a curved region or to a catch-all noise category, and a specialized cluster analysis would be used to develop an approach for labeling (clustering) the points generated according to the same mechanism in future scatter plots.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

tlemix: A General Framework for Robust Fitting of Finite Mixture Models in R

tlemix implements a general framework for robustly fitting discrete mixtures of regression models in the R statistical computing environment. It implements the FAST-TLE algorithm and uses the R package FlexMix as a computational engine for fitting mixtures of general linear models (GLMs) and model-based clustering in R.

متن کامل

Robust fitting of mixtures using the trimmed likelihood estimator

The Maximum Likelihood Estimator (MLE) has commonly been used to estimate the unknown parameters in the finite mixture of distributions via the expectationmaximization (EM) algorithm. However, the MLE can be very sensitive to outliers in the data. Various approaches that have incorporated robustness in fitting mixtures and clustering are discussed. Special attention is given to the Weighted Tri...

متن کامل

Robust Estimation in Gaussian Mixtures Using Multiresolution Kd-trees

For many applied problems in the context of clustering via mixture models, the estimates of the component means and covariance matrices can be affected by observations that are atypical of the components in the mixture model being fitted. In this paper, we consider for Gaussian mixtures a robust estimation procedure using multiresolution kd-trees. The method provides a fast EM-based approach to...

متن کامل

A general class of hierarchical ordinal regression models with applications to correlated ROC analysis

The authors discuss a general class of hierarchical ordinal regression models that includes both location and scale parameters, allows link functions to be selected adaptively as finite mixtures of normal cumulative distribution functions, and incorporates flexible correlation structures for the latent scale variables. Exploiting the well known correspondence between ordinal regression models a...

متن کامل

Hybrid Dirichlet mixture models for functional data

In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. ‘damaged’ areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009